There is increasing adoption of artificial intelligence in drug discovery. However, existing works use machine learning to mainly utilize the chemical structures of molecules yet ignore the vast textual knowledge available in chemistry. Incorporating textual knowledge enables us to realize new drug design objectives, adapt to text-based instructions, and predict complex biological activities. We present a multi-modal molecule structure-text model, MoleculeSTM, by jointly learning molecule's chemical structures and textual descriptions via a contrastive learning strategy. To train MoleculeSTM, we construct the largest multi-modal dataset to date, namely PubChemSTM, with over 280K chemical structure-text pairs. To demonstrate the effectiveness and utility of MoleculeSTM, we design two challenging zero-shot tasks based on text instructions, including structure-text retrieval and molecule editing. MoleculeSTM possesses two main properties: open vocabulary and compositionality via natural language. In experiments, MoleculeSTM obtains the state-of-the-art generalization ability to novel biochemical concepts across various benchmarks.
translated by 谷歌翻译
强化学习算法在竞争挑战板和视频游戏时表现良好。越来越多的研究工作侧重于提高加强学习算法的泛化能力。普通视频游戏AI学习竞赛旨在设计能够学习在培训期间出现不同游戏水平的代理商。本文总结了五年的一般视频游戏AI学习竞争。在每个版本,设计了三场新游戏。对于每场比赛,通过扰动或组合两个训练水平来产生三个测试水平。然后,我们提出了一种新颖的加强学习框架,对一般视频游戏的双程观察,在假设中,它更有可能在不同级别而不是全局信息中观察到类似的本地信息。因此,我们所提出的框架而不是直接输入基于目前游戏屏幕的单个原始像素的屏幕截图,而是将游戏屏幕的编码,转换的全局和本地观测视为两个同时输入,旨在学习播放新级别的本地信息。我们提出的框架是用三种最先进的加强学习算法实施,并在2020年普通视频游戏AI学习竞赛的游戏集上进行了测试。消融研究表明,使用编码,转换的全局和本地观察的出色性能。总体上最好的代理商进一步用作2021次竞赛版的基线。
translated by 谷歌翻译
Feature reuse has been a key technique in light-weight convolutional neural networks (CNNs) design. Current methods usually utilize a concatenation operator to keep large channel numbers cheaply (thus large network capacity) by reusing feature maps from other layers. Although concatenation is parameters- and FLOPs-free, its computational cost on hardware devices is non-negligible. To address this, this paper provides a new perspective to realize feature reuse via structural re-parameterization technique. A novel hardware-efficient RepGhost module is proposed for implicit feature reuse via re-parameterization, instead of using concatenation operator. Based on the RepGhost module, we develop our efficient RepGhost bottleneck and RepGhostNet. Experiments on ImageNet and COCO benchmarks demonstrate that the proposed RepGhostNet is much more effective and efficient than GhostNet and MobileNetV3 on mobile devices. Specially, our RepGhostNet surpasses GhostNet 0.5x by 2.5% Top-1 accuracy on ImageNet dataset with less parameters and comparable latency on an ARM-based mobile phone.
translated by 谷歌翻译
知识图的归纳链路预测旨在预测未见实体之间的缺失联系,而那些未在训练阶段显示的实体。大多数以前的作品都学习实体的特定实体嵌入,这些实体无法处理看不见的实体。最近的几种方法利用封闭子图来获得归纳能力。但是,所有这些作品仅在没有完整的邻近关系的情况下考虑子图的封闭部分,这导致了忽略部分邻近关系的问题,并且很难处理稀疏的子图。为了解决这个问题,我们提出了SNRI子图邻近关系Infomax,它足够从两个方面利用完整的相邻关系:节点特征的相邻关系特征和稀疏子图的相邻关系路径。为了进一步以全球方式建模邻近关系,我们对知识图进行创新的相互信息(MI)最大化。实验表明,SNRI在归纳链路预测任务上的大幅度优于现有的最新方法,并验证以全局方式探索完整的邻近关系的有效性,以表征节点特征和在稀疏子分类上的理由。
translated by 谷歌翻译
沿着整个空间尺寸聚集的全局空间统计数据广泛用于顶级性能图像恢复器。例如,在挤压和激发(SE)中采用的实例归一化(IN)中采用的实例归一化(IN)的平均值,方差,其被应用于MPRNet。本文首先显示在训练/测试阶段的基于补丁/全部图像的特征上聚合的统计分别可以分发非常不同,并导致图像恢复器中的性能下降。它已被以前的作品被广泛忽视。要解决此问题,我们提出了一种简单的方法,测试时将局部统计转换器(TLSC)替换为仅在测试时间中从全局到本地的统计聚合操作区域。如果没有再培训或芬降,我们的方法显着提高了图像恢复器的性能。特别是,通过将TLSC扩展到最先进的模型,MPRNET升压在GoPro数据集上的PSNR中的0.65 dB,实现了33.31dB,超过了先前的最佳结果0.6 dB。此外,我们只需将TLSC应用于高级视觉任务,即语义细分,并实现竞争结果。进行了广泛的数量和质量实验,以证明TLSC解决了边际成本的问题,同时显着获得。该代码可在https://github.com/megvii-research/tlsc中获得。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation (KD) is a common solution to compress GNNs, where a light-weighted model (i.e., the student model) is encouraged to mimic the behavior of a computationally expensive GNN (i.e., the teacher GNN model). Nevertheless, most existing GNN-based KD methods lack fairness consideration. As a consequence, the student model usually inherits and even exaggerates the bias from the teacher GNN. To handle such a problem, we take initial steps towards fair knowledge distillation for GNNs. Specifically, we first formulate a novel problem of fair knowledge distillation for GNN-based teacher-student frameworks. Then we propose a principled framework named RELIANT to mitigate the bias exhibited by the student model. Notably, the design of RELIANT is decoupled from any specific teacher and student model structures, and thus can be easily adapted to various GNN-based KD frameworks. We perform extensive experiments on multiple real-world datasets, which corroborates that RELIANT achieves less biased GNN knowledge distillation while maintaining high prediction utility.
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译